622 research outputs found

    jMOTU and Taxonerator: Turning DNA Barcode Sequences into Annotated Operational Taxonomic Units

    Get PDF
    BACKGROUND: DNA barcoding and other DNA sequence-based techniques for investigating and estimating biodiversity require explicit methods for associating individual sequences with taxa, as it is at the taxon level that biodiversity is assessed. For many projects, the bioinformatic analyses required pose problems for laboratories whose prime expertise is not in bioinformatics. User-friendly tools are required for both clustering sequences into molecular operational taxonomic units (MOTU) and for associating these MOTU with known organismal taxonomies. RESULTS: Here we present jMOTU, a Java program for the analysis of DNA barcode datasets that uses an explicit, determinate algorithm to define MOTU. We demonstrate its usefulness for both individual specimen-based Sanger sequencing surveys and bulk-environment metagenetic surveys using long-read next-generation sequencing data. jMOTU is driven through a graphical user interface, and can analyse tens of thousands of sequences in a short time on a desktop computer. A companion program, Taxonerator, that adds traditional taxonomic annotation to MOTU, is also presented. Clustering and taxonomic annotation data are stored in a relational database, and are thus amenable to subsequent data mining and web presentation. CONCLUSIONS: jMOTU efficiently and robustly identifies the molecular taxa present in survey datasets, and Taxonerator decorates the MOTU with putative identifications. jMOTU and Taxonerator are freely available from http://www.nematodes.org/

    A method for studying protistan diversity using massively parallel sequencing of V9 hypervariable regions of small-subunit ribosomal RNA genes

    Get PDF
    © 2009 The Authors. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in PLoS ONE 4 (2009): e6372, doi:10.1371/journal.pone.0006372.Massively parallel pyrosequencing of amplicons from the V6 hypervariable regions of small-subunit (SSU) ribosomal RNA (rRNA) genes is commonly used to assess diversity and richness in bacterial and archaeal populations. Recent advances in pyrosequencing technology provide read lengths of up to 240 nucleotides. Amplicon pyrosequencing can now be applied to longer variable regions of the SSU rRNA gene including the V9 region in eukaryotes. We present a protocol for the amplicon pyrosequencing of V9 regions for eukaryotic environmental samples for biodiversity inventories and species richness estimation. The International Census of Marine Microbes (ICoMM) and the Microbial Inventory Research Across Diverse Aquatic Long Term Ecological Research Sites (MIRADA-LTERs) projects are already employing this protocol for tag sequencing of eukaryotic samples in a wide diversity of both marine and freshwater environments. Massively parallel pyrosequencing of eukaryotic V9 hypervariable regions of SSU rRNA genes provides a means of estimating species richness from deeply-sampled populations and for discovering novel species from the environment.This work was supported by grants from the W.M. Keck Foundation and the Woods Hole Center for Oceans and Human Health from the National Institutes of Health and National Science Foundation (NIH/NIEHS 1 P50 ES012742-01 and NSF/OCE 0430724-J) (LAZ and SH)

    Prospecting environmental mycobacteria: combined molecular approaches reveal unprecedented diversity

    Get PDF
    Background: Environmental mycobacteria (EM) include species commonly found in various terrestrial and aquatic environments, encompassing animal and human pathogens in addition to saprophytes. Approximately 150 EM species can be separated into fast and slow growers based on sequence and copy number differences of their 16S rRNA genes. Cultivation methods are not appropriate for diversity studies; few studies have investigated EM diversity in soil despite their importance as potential reservoirs of pathogens and their hypothesized role in masking or blocking M. bovis BCG vaccine. Methods: We report here the development, optimization and validation of molecular assays targeting the 16S rRNA gene to assess diversity and prevalence of fast and slow growing EM in representative soils from semi tropical and temperate areas. New primer sets were designed also to target uniquely slow growing mycobacteria and used with PCR-DGGE, tag-encoded Titanium amplicon pyrosequencing and quantitative PCR. Results: PCR-DGGE and pyrosequencing provided a consensus of EM diversity; for example, a high abundance of pyrosequencing reads and DGGE bands corresponded to M. moriokaense, M. colombiense and M. riyadhense. As expected pyrosequencing provided more comprehensive information; additional prevalent species included M. chlorophenolicum, M. neglectum, M. gordonae, M. aemonae. Prevalence of the total Mycobacterium genus in the soil samples ranged from 2.3×107 to 2.7×108 gene targets g−1; slow growers prevalence from 2.9×105 to 1.2×107 cells g−1. Conclusions: This combined molecular approach enabled an unprecedented qualitative and quantitative assessment of EM across soil samples. Good concordance was found between methods and the bioinformatics analysis was validated by random resampling. Sequences from most pathogenic groups associated with slow growth were identified in extenso in all soils tested with a specific assay, allowing to unmask them from the Mycobacterium whole genus, in which, as minority members, they would have remained undetected

    Microbial community composition in sediments resists perturbation by nutrient enrichment

    Get PDF
    Author Posting. © The Author(s), 2010. This is the author's version of the work. It is posted here by permission of Nature Publishing Group for personal use, not for redistribution. The definitive version was published in The ISME Journal 5 (2011): 1540–1548, doi:10.1038/ismej.2011.22.Functional redundancy in bacterial communities is expected to allow microbial assemblages to survive perturbation by allowing continuity in function despite compositional changes in communities. Recent evidence suggests, however, that microbial communities change both composition and function as a result of disturbance. We present evidence for a third response: resistance. We examined microbial community response to perturbation caused by nutrient enrichment in salt marsh sediments using deep pyrosequencing of 16S rRNA and functional gene microarrays targeting the nirS gene. Composition of the microbial community, as demonstrated by both genes, was unaffected by significant variations in external nutrient supply, despite demonstrable and diverse nutrient–induced changes in many aspects of marsh ecology. The lack of response to external forcing demonstrates a remarkable uncoupling between microbial composition and ecosystem-level biogeochemical processes and suggests that sediment microbial communities are able to resist some forms of perturbation.Funding for this research came from NSF(DEB-0717155 to JEH, DBI-0400819 to JLB). Support for the sequencing facility came from NIH and NSF (NIH/NIEHS-P50-ES012742-01 and NSF/OCE 0430724-J Stegeman PI to HGM and MLS, and WM Keck Foundation to MLS). Salary support provided from Princeton University Council on Science and Technology to JLB. Support for development of the functional gene microarray provided by NSF/OCE99-081482 to BBW. The Plum Island fertilization experiment was funded by NSF (DEB 0213767 and DEB 0816963)

    Analysis of 16S rRNA Amplicon Sequencing Options on the Roche/454 Next-Generation Titanium Sequencing Platform

    Get PDF
    BACKGROUND: 16S rRNA gene pyrosequencing approach has revolutionized studies in microbial ecology. While primer selection and short read length can affect the resulting microbial community profile, little is known about the influence of pyrosequencing methods on the sequencing throughput and the outcome of microbial community analyses. The aim of this study is to compare differences in output, ease, and cost among three different amplicon pyrosequencing methods for the Roche/454 Titanium platform METHODOLOGY/PRINCIPAL FINDINGS: The following three pyrosequencing methods for 16S rRNA genes were selected in this study: Method-1 (standard method) is the recommended method for bi-directional sequencing using the LIB-A kit; Method-2 is a new option designed in this study for unidirectional sequencing with the LIB-A kit; and Method-3 uses the LIB-L kit for unidirectional sequencing. In our comparison among these three methods using 10 different environmental samples, Method-2 and Method-3 produced 1.5-1.6 times more useable reads than the standard method (Method-1), after quality-based trimming, and did not compromise the outcome of microbial community analyses. Specifically, Method-3 is the most cost-effective unidirectional amplicon sequencing method as it provided the most reads and required the least effort in consumables management. CONCLUSIONS: Our findings clearly demonstrated that alternative pyrosequencing methods for 16S rRNA genes could drastically affect sequencing output (e.g. number of reads before and after trimming) but have little effect on the outcomes of microbial community analysis. This finding is important for both researchers and sequencing facilities utilizing 16S rRNA gene pyrosequencing for microbial ecological studies

    Global distribution and diversity of marine Verrucomicrobia

    Get PDF
    Author Posting. © The Author(s), 2011. This is the author's version of the work. It is posted here by permission of Nature Publishing Group for personal use, not for redistribution. The definitive version was published in The ISME Journal 6 (2012): 1499-1505, doi:10.1038/ismej.2012.3.Verrucomicrobia is a bacterial phylum that is commonly detected in soil but little is known about the distribution and diversity of this phylum in the marine environment. To address this, we analyzed the marine microbial community composition in 506 samples from the International Census of Marine Microbes as well as eleven coastal samples taken from the California Current. These samples from both the water column and sediments covered a wide range of environmental conditions. Verrucomicrobia were present in 98% of the analyzed samples and thus appeared nearly ubiquitous in the ocean. Based on the occurrence of amplified 16S rRNA sequences, Verrucomicrobia constituted on average 2% of the water column and 1.4% of the sediment bacterial communities. The diversity of Verrucomicrobia displayed a biogeography at multiple taxonomic levels and thus, specific lineages appeared to have clear habitat preference. We found that Subdivision 1 and 4 generally dominated marine bacterial communities, whereas Subdivision 2 was confined to low salinity waters. Within the subdivisions, Verrucomicrobia community composition were significantly different in the water column compared to sediment as well as within the water column along gradients of salinity, temperature, nitrate, depth, and overall water column depth. Although we still know little about the ecophysiology of Verrucomicrobia lineages, the ubiquity of this phylum suggests that it may be important for the biogeochemical cycle of carbon in the ocean.We would like to thank the UCI Undergraduate Research Opportunity Program (S.F.), the National Science Foundation (OCE-0928544 and OCE-1046297, A.C.M.) and the Alfred P. Sloan Foundation (S.H., D.M.W., M.S.) for supporting the work

    SAMQA: error classification and validation of high-throughput sequenced read data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The advances in high-throughput sequencing technologies and growth in data sizes has highlighted the need for scalable tools to perform quality assurance testing. These tests are necessary to ensure that data is of a minimum necessary standard for use in downstream analysis. In this paper we present the SAMQA tool to rapidly and robustly identify errors in population-scale sequence data.</p> <p>Results</p> <p>SAMQA has been used on samples from three separate sets of cancer genome data from The Cancer Genome Atlas (TCGA) project. Using technical standards provided by the SAM specification and biological standards defined by researchers, we have classified errors in these sequence data sets relative to individual reads within a sample. Due to an observed linearithmic speedup through the use of a high-performance computing (HPC) framework for the majority of tasks, poor quality data was identified prior to secondary analysis in significantly less time on the HPC framework than the same data run using alternative parallelization strategies on a single server.</p> <p>Conclusions</p> <p>The SAMQA toolset validates a minimum set of data quality standards across whole-genome and exome sequences. It is tuned to run on a high-performance computational framework, enabling QA across hundreds gigabytes of samples regardless of coverage or sample type.</p

    Interpreting 16S metagenomic data without clustering to achieve sub-OTU resolution

    Full text link
    The standard approach to analyzing 16S tag sequence data, which relies on clustering reads by sequence similarity into Operational Taxonomic Units (OTUs), underexploits the accuracy of modern sequencing technology. We present a clustering-free approach to multi-sample Illumina datasets that can identify independent bacterial subpopulations regardless of the similarity of their 16S tag sequences. Using published data from a longitudinal time-series study of human tongue microbiota, we are able to resolve within standard 97% similarity OTUs up to 20 distinct subpopulations, all ecologically distinct but with 16S tags differing by as little as 1 nucleotide (99.2% similarity). A comparative analysis of oral communities of two cohabiting individuals reveals that most such subpopulations are shared between the two communities at 100% sequence identity, and that dynamical similarity between subpopulations in one host is strongly predictive of dynamical similarity between the same subpopulations in the other host. Our method can also be applied to samples collected in cross-sectional studies and can be used with the 454 sequencing platform. We discuss how the sub-OTU resolution of our approach can provide new insight into factors shaping community assembly.Comment: Updated to match the published version. 12 pages, 5 figures + supplement. Significantly revised for clarity, references added, results not change

    PanGEA: Identification of allele specific gene expression using the 454 technology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Next generation sequencing technologies hold great potential for many biological questions. While mainly used for genomic sequencing, they are also very promising for gene expression profiling. Sequencing of cDNA does not only provide an estimate of the absolute expression level, it can also be used for the identification of allele specific gene expression.</p> <p>Results</p> <p>We developed PanGEA, a tool which enables a fast and user-friendly analysis of allele specific gene expression using the 454 technology. PanGEA allows mapping of 454-ESTs to genes or whole genomes, displaying gene expression profiles, identification of SNPs and the quantification of allele specific gene expression. The intuitive GUI of PanGEA facilitates a flexible and interactive analysis of the data. PanGEA additionally implements a modification of the Smith-Waterman algorithm which deals with incorrect estimates of homopolymer length as occuring in the 454 technology</p> <p>Conclusion</p> <p>To our knowledge, PanGEA is the first tool which facilitates the identification of allele specific gene expression. PanGEA is distributed under the Mozilla Public License and available at: <url>http://www.kofler.or.at/bioinformatics/PanGEA</url></p
    • 

    corecore